cient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures ?

نویسندگان

  • Siegfried Benkner
  • Thomas Brandes
چکیده

This paper presents a new parallelization method for an efcient implementation of unstructured array reductions on shared memory parallel machines with OpenMP. This method is strongly related to parallelization techniques for irregular reductions on distributed memory machines as employed in the context of High Performance Fortran. By exploiting data locality, synchronization is minimized without introducing severe memory or computational overheads as observed with most existing shared memory parallelization techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures

This paper presents a new parallelization method for an ef-cient implementation of unstructured array reductions on shared memory parallel machines with OpenMP. This method is strongly related to parallelization techniques for irregular reductions on distributed memory machines as employed in the context of High Performance Fortran. By exploiting data locality, synchronization is minimized with...

متن کامل

Parallelization of a Dynamic Unstructured Algorithm Using Three Leading Programming Paradigms

The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adapta...

متن کامل

The Tera Multithreaded Architecture and Unstructured Meshes

The Tera Multithrcadcd Architecture (MTA) is a new parallel supcrcomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized...

متن کامل

A High Resolution Finite Volume Method for Efficient Parallel Simulation of Casting Processes on Unstructured Meshes

We discuss selected aspects of a new parallel three-dimensional (3-D) computational tool for the unstructured mesh simulation of Los Alamos National Laboratory (LANL) casting processes. This tool, known as Telluride, draws upon on robust, high resolution finite volume solutions of metal alloy mass, momentum, and enthalpy conservation equations to model the filling, cooling, and solidification o...

متن کامل

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures

Most HPC systems are clusters of shared memory nodes. Parallel programming must combine the distributed memory parallelization on the node inter-connect with the shared memory parallelization inside of each node. The hybrid MPI+OpenMP programming model is compared with pure MPI, compiler based parallelization, and other parallel programming models on hybrid architectures. The paper focuses on b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000